This webpage provides supporting information for the manuscript “Phylogenetic generalised linear mixed-effects modelling with glmmTMB R package”. The webpage is organised into three sections: (1) First, we describe the simulated models and data-generating mechanisms for each of the five evaluated R packages for fitting PGLMMs (2) We then use the simulated data to illustrate compilation and sampling runtimes for the Bayesian MCMC models. (3) Finally, we presents two case studies illustrating the application of these methods to ecological and evolutionary data, the first on bird color traits and the second on plant traits.
We demonstrate and explain how to fit phylogenetic generalized mixed models (PGLMM) using five different packages which we used in our simulation study: glmmTMB, phyr, MCMCglmm, brms, and INLA. The models are fitted to one simulated dataset using a randomly generated phylogenetic tree and a set of model parameters. We compare the performance of these packages in terms of runtime, accuracy, and bias of the fixed effect mean and random effect variance estimates.
We assume a set of trait measures \(y_{ij}\) repeated measure (observation) \(i\) and for each species \(j\). The model is specified as follows:
\(p_i \sim \mathcal{N}(0, \sigma^2_{\text{p}} \mathbf{A})\): species-level effect (phylogenetic), where \(\mathbf{A}\) is the phylogenetic correlation matrix
First we specify the parameter values for one simulation run. We assume the following:
Code
### set up model parameters seed <-1b0 <-1# fixed effect interceptb1 <-1.5# fixed effect slopek.species <-30# number of speciesn.reps <-10# number of repeated measures per species (assuming a balanced design)sigma2.n <-0.25# variance of non-phylogenetic effectsigma2.p <-0.25# variance of phylogenetic effectsigma2.e <-0.2# residual error variance
Simulate data based on these parameter values:
Code
set.seed(seed)# set up k.obs (number of observations per species)k.obs <- n.reps # assumes a balanced design# species id sp.id <-rep(seq_len(k.species), times=k.obs)# total number of observationsn <-length(sp.id)# simulate simple dataframe with covariate (x) variablex <-runif(n, 10, 20) dat <-data.frame(obs =1:n, x = x, species = sp.id)# simulate tree and obtain phylo matrixtree <-rtree(k.species, tip.label =seq_len(k.species))tree <-compute.brlen(tree, power=1) # power of 1 means ultra-metric treephylo.mat <-vcv(tree, corr =TRUE) ## we want a correlation matrix (bounded by -1 and 1)phylo.mat <- phylo.mat[order(as.numeric(rownames(phylo.mat))), order(as.numeric(rownames(phylo.mat)))]# Simulate response variable (phen) based on cofactor and phylogenetic matrixu.s <-rnorm(k.species, 0, sqrt(sigma2.n))[sp.id]u.p <-mvrnorm(1, mu=rep(0, k.species), Sigma=sigma2.p*phylo.mat)[sp.id]ei <-rnorm(n, 0, sqrt(sigma2.e))# get estimates of yyi <- b0 + b1*x + u.s + u.p + ei### append all to dataframedat <-cbind(dat, u.s, u.p, ei, b0, b1, yi)dat$phylo <- dat$species # phylo ID variable (same as species) - needs to be numeric to work with INLAdat$species <-factor(dat$species) # format species variable for modelsdat$sp <- dat$species # create sp variable (for phyr)dat$g <-1# add variable g constant (for glmmTMB)
To fit a PGLMM using glmmTMB package we specify the species-level random effect with phylogenetic relatedness using the propto covariance structure. In the first part we specify the random intercept (0 + species), followed by the grouping variable g (which we assume here as constant), and then the phylogenetic correlation matrix phylo.mat.
The model output will provide the fixed effect estimates, random effect variance estimates, and the residual variance estimate. The random effect variance estimates will be on the standard deviation scale, so we square them to get the variance estimates.
Code
# repeated measures per speciestime.glmmTMB <-system.time({ model_glmmTMB <-glmmTMB(yi ~ x + (1|species) +propto(0+ species|g, phylo.mat),data = dat,REML =TRUE)})
Check if the model converged (i.e. returns TRUE if the Hessian is positive definite, for more details read here: https://cran.r-project.org/web/packages/glmmTMB/vignettes/troubleshooting.html):
Code
model_glmmTMB$sdr$pdHess
[1] TRUE
Run model with phyr
To fit the model using phyr package we use the cov_ranef argument to specify the phylogenetic tree directly. The random effect term (1|sp__) indicates that we want to include both phylogenetic and non-phylogenetic random effects. The underscores __ indicate that we want to include phylogenetic correlations in the model.
To fit the MCMCglmm model we first we need to set up a precision matrix for the phylogenetic random effect which is required by MCMCglmm. We use the inverseA function from the ape package to obtain the inverse of the phylogenetic covariance matrix. The prior list specifies the priors for the random effects and residual variance. The ginverse argument is used to specify the inverse of the phylogenetic covariance matrix. Then we specify the number of iterations, burn-in, and thinning parameters for the MCMC sampling. Here we increased the number of iterations by 30 times the default value to ensure convergence and stability of the MCMC chains, but this could be further increased. Usually you would want to set these to some reasonably large values.
Code
# get precision phylo matrix and order rowsphylo.prec.mat <-inverseA(tree, nodes ="TIPS", scale =TRUE)$Ainvphylo.prec.mat <- phylo.prec.mat[order(as.numeric(rownames(phylo.prec.mat))),order(as.numeric(rownames(phylo.prec.mat)))]# set recommended priors with two random effectsprior <-list(G=list(G1=list(V=1,nu=1,alpha.mu=0,alpha.V=1000), G2=list(V=1,nu=1,alpha.mu=0,alpha.V=1000)),R=list(V=1,nu=0.02))# fit MCMCglmm modeltime.mcmc <-system.time({ model_mcmc <-MCMCglmm(yi ~ x,random =~species + phylo,family ="gaussian",ginverse =list(phylo = phylo.prec.mat),prior = prior,data = dat,verbose =FALSE,nitt =303000, # increase default by x30burnin =3000, # defaultthin =10) # default})
Next we check the effective sample size (ESS) and the Heidelberger diagnostic test which checks if the MCMC chains have converged and are stationary. The ESS should be above 400 for both fixed and random effects (Vehtari et al., 2021), and the Heidelberger diagnostic should return TRUE for all parameters.
Stationarity start p-value
test iteration
(Intercept) passed 1 0.184
x passed 1 0.580
species passed 1 0.767
phylo passed 1 0.855
units passed 1 0.490
Halfwidth Mean Halfwidth
test
(Intercept) passed 0.845 0.005440
x passed 1.510 0.000114
species passed 0.149 0.002052
phylo passed 0.660 0.017493
units passed 0.208 0.000203
Run model with brms
To fit the model using the brms package we specify the random effects using the (1|species) term for non-phylogenetic random effects and gr(phylo, cov = phylo.mat) for phylogenetic random effects. The data2 argument is used to pass the phylogenetic correlation matrix to the model (the same matrix format as for glmmTMB). We set the number of iterations, chains, and cores to reasonable values to ensure convergence and stability of the MCMC chains. Here we increased the number of iterations by 10 times the default value to ensure convergence and stability of the MCMC chains, but this could be further increased (and should be if ESS are low and Rhat values are higher than 1.01).
Code
time.brms <-system.time({ model_brms <-brm(yi ~ x + (1|species) + (1|gr(phylo, cov = phylo.mat)), #phylo.mat is the correlation matrixdata = dat,family =gaussian(),chains =4, # defaultiter =20000, # increased default x10cores =4, # equal to number of chainsdata2 =list(phylo.mat = phylo.mat))})
Check that the maximum effective sample size (ESS) of the model is high enough and check the Rhat value is below 1.01 (Vehtari et al. 2021):
# check RHat value is below 1.01 (Vehtari et al. 2021)max(rhat(model_brms))<1.01
[1] TRUE
Code
# alternatively, we can use the bayestestR package to check the diagnostics of fixed effectsprint(bayestestR::diagnostic_posterior(model_brms), digits =4)
For the INLA package we set up the model using the f() function to specify the random effects. The model = "iid" argument indicates that we want to use an independent and identically distributed (iid) random effect for species, while the model = "generic0" argument is used for the phylogenetic random effect. The Cmatrix argument is used to specify the phylogenetic correlation matrix. We note that priors can be
Code
# set up recommended penalizing complexity priors pcprior =list(prec =list(prior="pc.prec", param =c(20, 0.1)))time.inla <-system.time({ model_inla <-inla(yi ~ x +f(species, model ="iid") +f(phylo, ## this needs to be a numeric to workmodel ="generic0",Cmatrix = phylo.prec.mat,hyper=pcprior),family ="gaussian",data = dat)})fit_inla <-summary(model_inla)
Extract model estimates
First we extract the covariate \(x\) estimate and confidence interval for each model:
Code
## Format run time for each model ---------time.phyr <-as.numeric(time.phyr[3])time.glmmTMB <-as.numeric(time.glmmTMB[3])time.brms <-as.numeric(time.brms[3])time.mcmc <-as.numeric(time.mcmc[3])time.inla <-as.numeric(time.inla[3])## Fixed effect estimates -------# phyrcoefs_phyr <-as.data.frame(fixef(model_phyr))coefs_phyr$conf.low[2] <- coefs_phyr$Value[2] - coefs_phyr$Std.Error[2]*1.96coefs_phyr$conf.high[2] <- coefs_phyr$Value[2] + coefs_phyr$Std.Error[2]*1.96# glmmTMBcoefs_tmb <-as.data.frame(confint(model_glmmTMB, parm="beta_"))# brmscoefs_brm <-as.data.frame(tidy(model_brms, effects="fixed", conf.int=TRUE))# MCMCglmm coefs_mcmc <-as.data.frame(tidy(model_mcmc, effects="fixed", conf.int=TRUE))# INLAcoefs_inla <-as.data.frame(fit_inla$fixed)## Combine fixed effect results -----------------res_fixed <-data.frame(model =c("phyr", "glmmTMB", "brms", "MCMCglmm", "INLA"),species_size = k.species,sample_size = n,run_time =c(time.phyr, time.glmmTMB, time.brms, time.mcmc, time.inla),b0 =rep(dat$b0[1], 5),b1 =rep(dat$b1[1], 5),mu =c(coefs_phyr$Value[2], coefs_tmb$Estimate[2], coefs_brm$estimate[2], coefs_mcmc$estimate[2], coefs_inla$mean[2]),mu_ci_low =c(coefs_phyr$conf.low[2], coefs_tmb$`2.5 %`[2], coefs_brm$conf.low[2], coefs_mcmc$conf.low[2], coefs_inla$`0.025quant`[2]),mu_ci_high =c(coefs_phyr$conf.high[2], coefs_tmb$`97.5 %`[2], coefs_brm$conf.high[2], coefs_mcmc$conf.high[2], coefs_inla$`0.975quant`[2]),stringsAsFactors =FALSE)## Show table of results of runtime and fixed effects ------kable(res_fixed, caption ="Runtime and fixed effect estimates of the simulated model",col.names =c("Model", "Species size", "Sample size", "Run time (s)", "b0", "b1", "Estimate (b1)", "CI low (b1)", "CI high (b1)"),digits =3,format ="html")
Runtime and fixed effect estimates of the simulated model
Model
Species size
Sample size
Run time (s)
b0
b1
Estimate (b1)
CI low (b1)
CI high (b1)
phyr
30
300
0.55
1
1.5
1.51
1.491
1.530
glmmTMB
30
300
1.00
1
1.5
1.51
1.491
1.530
brms
30
300
489.67
1
1.5
1.51
1.490
1.530
MCMCglmm
30
300
134.45
1
1.5
1.51
1.490
1.530
INLA
30
300
5.64
1
1.5
1.51
1.490
1.529
Extract the variance component estimates for each model and compare:
Code
#### Random effect estimates -------# get phyr random effect variance estimates var_re_phyr <-c(as.numeric(model_phyr$ss[2])^2, #phylogenetic as.numeric(model_phyr$ss[1])^2, #non-phylogenetic as.numeric(model_phyr$ss[3])^2) #residual # combine into dataframesigma2_phyr <-data.frame(model ="phyr",group =c("phylo", "species", "Residual"),term ="var",estimate = var_re_phyr,std.error =NA,conf.low =NA, conf.high =NA)# get glmmTMB random effect variance estimates (by default it is on the standard deviation scale)re_tmb <-as.data.frame(confint(model_glmmTMB, parm="theta_"))species_tmb <- re_tmb[1, ]phylo_tmb <- re_tmb[2, ]# combine into dataframesigma2_tmb <-data.frame(model ="glmmTMB",group =c("phylo", "species", "Residual"),term ="var",estimate =c(phylo_tmb$Estimate^2, #phylo variance estimates species_tmb$Estimate^2, #non-phylo variance estimatessigma(model_glmmTMB)^2),std.error =NA, #conf.low =c(phylo_tmb$`2.5 %`, species_tmb$`2.5 %`, NA), # Replace with the residual var CI if availableconf.high =c(phylo_tmb$`97.5 %`, species_tmb$`97.5 %`, NA) # Replace with the residual var CI if available)# Compute variance, SE (delta method), and CI on variance scale)var_est <- re_tmb$Estimate^2var_se <-2* re_tmb$Estimate * (re_tmb$`97.5 %`- re_tmb$`2.5 %`) / (2*1.96)var_ci_low <- re_tmb$`2.5 %`^2var_ci_high <- re_tmb$`97.5 %`^2# Residual varianceresid_var <-sigma(model_glmmTMB)^2# Combine resultssigma2_tmb <-data.frame(model ="glmmTMB",group =c("phylo", "species", "Residual"),term ="var",estimate =c(var_est, resid_var),std.error =c(var_se, NA),conf.low =c(var_ci_low, NA),conf.high =c(var_ci_high, NA))# get brms random effect variance estimates (standard deviation scale)sigma_brms <-tidy(model_brms, effects="ran_pars")sigma2_brms <- sigma_brms %>%mutate(model="brms",term=str_replace(term, "sd", "var"),estimate=estimate^2) %>%##compute variance estimates dplyr::select(model, group, term, estimate, std.error, conf.low, conf.high)# get MCMCglmm random effect estimates (variance scale)sigma2_mcmc <-tidy(model_mcmc, effects="ran_pars", conf.int=TRUE)sigma2_mcmc <- sigma2_mcmc %>%mutate(model="MCMCglmm",group=str_replace(group,"animal", "phylo")) %>% dplyr::select(model, group, term, estimate, std.error, conf.low, conf.high)# get INLA random effect estimates (precision scale i.e. inverse variance)re_inla <-1/fit_inla$hyperparsigma2_inla <-data.frame(model ="INLA",group =c( "Residual", "species", "phylo"),term ="var",estimate = re_inla$mean,std.error = fit_inla$hyperpar$sd,conf.low = re_inla$`0.025quant`,conf.high = re_inla$`0.975quant`)# merge fixed results togethers2 <-as.data.frame(rbind(sigma2_phyr, sigma2_tmb, sigma2_brms, sigma2_mcmc, sigma2_inla))# get subsets for each groups2_phylo <- s2 %>%filter(group=="phylo")s2_sp <- s2 %>%filter(group=="species")s2_res <- s2 %>%filter(group=="Residual")## Combine random effect var estimates results -----------------res_rand <-data.frame(model =c("phyr", "glmmTMB", "brms", "MCMCglmm", "INLA"),species_size = k.species,sample_size = n,run_time =c(time.phyr, time.glmmTMB, time.brms, time.mcmc, time.inla),sigma2_phylo = s2_phylo$estimate,sigma2_species = s2_sp$estimate,sigma2_residual = s2_res$estimate,stringsAsFactors =FALSE)## Show table of results of runtime and random variance ------kable(res_rand, caption ="Runtime and random component variance estimates of the simulated model",col.names =c("Model", "Species size", "Sample size", "Run time (s)", "Phylo variance est.", "Non-phylo variance est.", "Residual variance est."),digits =3,format ="html")
Runtime and random component variance estimates of the simulated model
Model
Species size
Sample size
Run time (s)
Phylo variance est.
Non-phylo variance est.
Residual variance est.
phyr
30
300
0.55
0.346
0.639
0.206
glmmTMB
30
300
1.00
0.132
0.427
0.206
brms
30
300
489.67
0.568
0.140
0.208
MCMCglmm
30
300
134.45
0.660
0.149
0.208
INLA
30
300
5.64
0.549
0.077
0.205
3 Bayesian MCMC model runtime
Here we provide a breakdown of the compilation and sampling runtimes for the Bayesian MCMC models using the above simulated repeated measures dataset. The runtimes are provided for each model fitted using the MCMCglmm and brms packages. Note: the compilation time is the time taken to compile the model, while the sampling time is the time taken to sample from the posterior distribution of the model parameters.
First we set up the MCMCglmm model.
MCMCglmm
Set up MCMCglmm tuning parameters - usually set to some reasonably large values through trial and error.
The following case studies illustrate how phylogenetic mixed models can be applied to diverse questions in ecology and evolution. The first examines patterns of plumage colour in birds, while the second explores macroevolutionary patterns in plant traits. Together, they demonstrate the flexibility of these methods across taxa and research questions.
All models, results, and datasets presented in these case studies are intended for illustrative purposes only. They are designed to demonstrate analytical approaches rather than to provide definitive biological insights. No substantive conclusions should be drawn from these analyses.
4.2 Case study 1: Bird color evolution
Birds are a valuable system for studying ecological and evolutionary processes because their diversity in form, behaviour, and coloration is well documented across many species and habitats. Variation in plumage colour, in particular, can provide insights into mechanisms such as sexual selection, camouflage, and signalling. By examining these traits, researchers can better understand how environmental and evolutionary pressures shape biodiversity.